

WE CLAIM:

1. A method of performing a  $k$ -stage FFT (fast Fourier transform) computation of  $N$  data points wherein  $k$  and  $N$  are integers, the method comprising:

5 performing a plurality of operations upon the  $N$  data points, each one of the plurality of operations comprising;

performing an import of a respective plurality of sets of data points of the  $N$  data points from an external memory into an internal memory;

10 performing a FFT computation of the  $k$ -stage FFT computation upon each one of the sets of data points in the internal memory; and

15 performing an export of the respective plurality of sets of data points of the  $N$  data points from the internal memory into the external memory to update the respective sets of data points of the  $N$  data points in the external memory.

2. A method of performing a  $k$ -stage FFT computation of  $N$  data points wherein  $k$  and  $N$  are integers, the method comprising:

20 for each one of a plurality of stage groupings of at least one stage of  $k$  stages of the  $k$ -stage FFT computation wherein the plurality of stage groupings of at least one stage collectively comprise the  $k$  stages, performing a respective plurality of operations on the  $N$  data points, each one of the 25 respective plurality of operations comprising:

performing an import of a respective plurality of sets of data points of the  $N$  data points from an external memory into an internal memory;

performing a FFT computation upon each one of the respective plurality of sets of data points of the N data points in the internal memory; and

5 performing an export of the respective plurality of sets of data points of the N data points from the internal memory into the external memory to update the respective plurality of sets of data points of the N data points in the external memory.

3. A method of performing a k-stage FFT computation  
10 according to claim 2 comprising performing an import of a next respective plurality of sets of data points of the N data points for a next one of the respective plurality of operations while the performing a FFT computation upon each one of the respective plurality of sets of data points of the N data points in the internal memory.

4. A method of performing a k-stage FFT computation according to claim 2 comprising performing an export of a previous respective plurality of sets of data points of the N data points for a previous one of the respective plurality of operations while the performing a FFT computation upon each one of the respective plurality of sets of data points of the N data points in the internal memory.

5. A method of performing a k-stage FFT computation according to claim 2 wherein the FFT computation is performed  
25 using decimation in frequency (DIF).

6. A method of performing a k-stage FFT computation according to claim 2 wherein the FFT computation is performed using decimation in time (DIT).

7. A method of performing a  $k$ -stage FFT computation according to claim 2 wherein  $N$  substantially satisfies  $N = 2^k$ .

8. A method of performing a  $k$ -stage FFT computation according to claim 7 wherein the stage groupings of at least 5 one stage comprise  $Q$  stage groupings each comprising  $S$  stages, said  $Q$  and said  $S$  being integers with  $Q \geq 1$ ,  $S \geq 1$  and  $k \geq QS$ .

9. A method of performing a  $k$ -stage FFT computation according to claim 8 comprising:

grouping, within a stage grouping,  $I$ , of the  $Q$  stage 10 groupings wherein  $I$  is an integer satisfying  $0 \leq I \leq Q-1$ , the  $N$  data points into  $2^{is}$  blocks of data of  $N/2^{is}$  respective data points;

grouping, within each one of the  $2^{is}$  blocks of data, the  $N/2^{is}$  respective data points into  $2^s$  sub-blocks of data of 15  $N/2^{is+s}$  respective data points;

grouping, within each one of the  $2^s$  sub-blocks of data, the  $N/2^{is+s}$  respective data points into  $M_1$  sub-sub-blocks of data of  $N/(2^{is+s} M_1)$  respective data points wherein  $M_1$  is an integer;

20 taking, within each one of the  $2^{is}$  blocks of data, a respective sub-sub-block of data of the  $M_1$  sub-sub-blocks of data from each one of the  $2^s$  sub-blocks of data to form a set of  $2^s$  sub-sub-blocks of data of  $N/(2^{is} M_1)$  data points a total of  $M_1$  times to produce  $M_1$  sets of  $2^s$  sub-sub-blocks of data of 25  $N/(2^{is} M_1)$  data points; and

taking, within each one of the  $M_1$  sets of  $2^s$  sub-sub-blocks of data, a respective data point from each one of the  $2^s$

13. A method of performing a k-stage FFT computation according to claim 10 wherein the performing an S-stage FFT computation of the k-stage FFT computation upon each set of  $2^s$  data points of the respective  $N/(2^{IS+s}M_1)$  sets of  $2^s$  data points 5 results in  $N/(2^{IS+s} M_1)$  S-stage FFT computations, the  $N/(2^{IS+s} M_1)$  S-stage FFT computations comprising performing one stage of S stages for each one of the  $N/(2^{IS+s} M_1)$  S-stage FFT computations and iterating through the S stages.

14. A method of performing a k-stage FFT computation 10 according to claim 10 wherein the performing an S-stage FFT computation of the k-stage FFT computation upon each set of  $2^s$  data points of the respective  $N/(2^{IS+s}M_1)$  sets of  $2^s$  data points results in  $N/(2^{IS+s} M_1)$  S-stage FFT computations, the  $N/(2^{IS+s} M_1)$  S-stage FFT computations comprising performing computations for 15 S stages of an S-stage FFT computation of one of the  $N/(2^{IS+s} M_1)$  S-stage FFT computations before another S-stage FFT computation is performed on another of one of the  $N/(2^{IS+s} M_1)$  S-stage FFT computations.

15. A method of performing a k-stage FFT computation 20 according to claim 8 wherein the stage groupings of at least one stage comprise a stage grouping comprising D stages wherein D is an integer and  $k=QS+D$ .

16. A method of performing a k-stage FFT computation according to claim 15 wherein the performing an import 25 comprises importing the plurality of sets of data points into a buffer in the internal memory, the buffer having a capacity to hold  $M_1$  data points and D being an integer substantially satisfying  $D \leq \log_2(M_1)$ .

17. A method of performing a k-stage FFT computation 30 according to claim 15 wherein the performing an import

comprises importing the plurality of sets of data points into one of two buffers of a double buffer in the internal memory, each one of the two buffers having a capacity to hold  $M_I/2$  data points and  $D$  being an integer substantially satisfying

5  $D \leq \log_2(M_I/2)$ .

18. A method of performing a  $k$ -stage FFT computation according to claim 15 comprising, for each one of a plurality of sub-stage groupings of at least one stage of the stage grouping of  $D$  stages wherein the plurality of sub-stage groupings of at least one stage collectively comprise the  $D$  stages, performing a set of sub-stage FFT computations wherein for each one of the sub-stage FFT computations respective data points and respective twiddle factors are loaded into a high-speed cache.

10 15 19. A processing platform implementing the method of claim 2.

20. A processor comprising:

an internal memory adapted to store data;

20 a DMA (direct memory access) unit adapted to import data into the internal memory and to export data from the internal memory; and

25 a central processor unit (CPU) adapted to perform, for each one of a plurality of stage groupings of at least one stage of  $k$  stages of the  $k$ -stage FFT computation wherein the plurality of stage groupings of at least one stage collectively comprise the  $k$  stages, a respective plurality of operations on the  $N$  data points, for each one of the respective plurality of operations the CPU is adapted to:

provide instructions to the DMA unit for performing an import of a respective plurality of sets of data points, of N data points, into the internal memory;

5 perform a FFT computation upon each one of the respective plurality of sets of data points, of the N data points, in the internal memory; and

10 provide instructions to the DMA unit for performing an export of the respective plurality of sets of data points, of the N data points, from the internal memory to update the respective plurality of sets of data points of the N data points.

21. A processor according to claim 20 further comprising an external memory adapted to store the N data points.

22. A processor according to claim 21 comprising at least 15 one bus wherein the DMA unit is further adapted to import the respective plurality of sets of data points, of the N data points, from the external memory and to export the respective plurality of sets of data points, of the N data points, into the external memory through the at least one bus.

20 23. A processor according to claim 20 wherein the internal memory comprises:

a double buffer comprising two buffers each adapted to stored data points;

25 two buses each adapted to provide a channel for importing the data points into a respective one of the two buffers and for exporting the data points from the respective one of the two buffers; and

the CPU being further adapted to provide instructions to the DMA unit for importing a respective plurality of sets of data points, of the N data points, into one of the two buffers while the CPU performs FFT computations upon another respective 5 plurality of sets of data points, of the N data points, stored in another one of the two buffers.

24. A processor according to claim 20 wherein the internal memory comprises:

10 a double buffer comprising two buffers each adapted to stored data points;

two buses each adapted to provide a channel for importing the data points into a respective one of the two buffers and for exporting the data points from the respective one of the two buffers; and

15 the CPU being further adapted to provide instructions to the DMA unit for exporting respective sets of data points from one of the two buffers while the CPU performs FFT computations upon other respective sets of data points stored in another one of the two buffers.

20 25. A processor according to claim 20 wherein N substantially satisfies  $N = 2^k$  with k being an integer and wherein the stage groupings of at least one stage comprise Q stage groupings each comprising S stages, said Q and said S being integers with  $Q \geq 1$ ,  $S \geq 1$  and  $k \geq QS$ .

25 26. A processor according to claim 20 wherein the CPU is adapted to:

group, within a stage grouping, I, of the Q stage groupings wherein I is an integer satisfying  $0 \leq I \leq Q-1$ , the N

data points into  $2^{IS}$  blocks of data of  $N/2^{IS}$  respective data points;

group, within each one of the  $2^{IS}$  blocks of data, the  $N/2^{IS}$  respective data points into  $2^S$  sub-blocks of data of  $N/2^{IS+S}$  5 respective data points;

group, within each one of the  $2^S$  sub-blocks of data, the  $N/2^{IS+S}$  respective data points into  $M_1$  sub-sub-blocks of data of  $N/(2^{IS+S} M_1)$  respective data points wherein  $M_1$  is an integer;

take, within each one of the  $2^{IS}$  blocks of data, a 10 respective sub-sub-block of data of the  $M_1$  sub-sub-blocks of data from each one of the  $2^S$  sub-blocks of data to form a set of  $2^S$  sub-sub-blocks of data of  $N/(2^{IS} M_1)$  data points a total of  $M_1$  times to produce  $M_1$  sets of  $2^S$  sub-sub-blocks of data of  $N/(2^{IS} M_1)$  data points; and

15 take, within each one of the  $M_1$  sets of  $2^S$  sub-sub-blocks of data, a respective data point from each one of the  $2^S$  sub-blocks of data to form a set of  $2^S$  data points a total of  $N/(2^{IS+S} M_1)$  times to produce  $N/(2^{IS+S} M_1)$  sets of  $2^S$  data points.

27. A processor according to claim 26 wherein, for the 20 stage grouping, I, of the Q stage groupings of the plurality of stage groupings, the CPU is adapted to perform  $M_1$  of the plurality operations for each one of the  $2^{IS}$  blocks of data within the stage grouping, I, and for each one of the  $M_1$  of the plurality operations the CPU is adapted to:

25 provide instructions to the DMA unit for performing an import of respective  $N/(2^{IS+S} M_1)$  sets of  $2^S$  data points into the internal memory;

performing an S-stage FFT computation of the k-stage FFT computation upon each set of  $2^s$  data points of the respective  $N/(2^{IS+s}M_1)$  sets of  $2^s$  data points in the internal memory; and

5 provide instructions to the DMA unit for performing an export of the respective  $N/(2^{IS+s}M_1)$  sets of  $2^s$  data points from the internal memory to update the respective  $N/(2^{IS+s}M_1)$  sets of  $2^s$  data points.

28. A processor according to claim 27 wherein the  
10 internal memory comprises a buffer adapted to store the plurality of sets of data points, wherein the buffer has a capacity to hold  $M_I$  data points and  $M_1$  substantially satisfies  $M_1 = N/(2^{IS}M_I)$  wherein  $M_I$  in an integer.

29. A processor according to claim 27 wherein the  
15 internal memory comprises a double buffer adapted to store the plurality of sets of data points, wherein the double buffer comprises two buffers each capable of storing  $M_I/2$  data points and wherein said  $M_1$  substantially satisfies  $M_1 = N/(2^{IS-1}M_I)$ ,  $M_I$  being an integer.

20 30. A processor according to claim 29 wherein while the CPU performs the S-stage FFT computation of the k-stage FFT computation upon each set of  $2^s$  data points of the respective  $N/(2^{IS+s}M_1)$  sets of  $2^s$  data points, the respective  $N/(2^{IS+s}M_1)$  sets of  $2^s$  data points being stored in one of the two buffers, the  
25 CPU is adapted to provide instructions to the DMA unit to import, into another one of the two buffers, respective  $N/(2^{IS+s}M_1)$  sets of  $2^s$  data points of a next one of the each one of the  $M_1$  of the plurality operations.

31. A processor according to claim 29 wherein while the  
30 CPU performs the S-stage FFT computation of the k-stage FFT

computation upon each set of  $2^s$  data points of the respective  $N/(2^{IS+S}M_1)$  sets of  $2^s$  data points, the respective  $N/(2^{IS+S}M_1)$  sets of  $2^s$  data points being stored in one of the two buffers, the CPU is adapted to provide instructions to the DMA unit to 5 export, from another one of the two buffers, respective  $N/(2^{IS+S}M_1)$  sets of  $2^s$  data points from previous one of the each one of the  $M_1$  of the plurality operations.

32. A processor according to claim 27 wherein the performing an S-stage FFT computation of the k-stage FFT 10 computation upon each set of  $2^s$  data points of the respective  $N/(2^{IS+S}M_1)$  sets of  $2^s$  data points results in  $N/(2^{IS+S}M_1)$  S-stage FFT computations, the CPU being further adapted to evaluate the  $N/(2^{IS+S}M_1)$  S-stage FFT computations at one stage of S stages for each one of the  $N/(2^{IS+S}M_1)$  S-stage FFT computations and 15 iterate through the S stages.

33. A processor according to claim 27 wherein the internal memory comprises a high-speed cache.

34. A processor according to claim 33 wherein the performing an S-stage FFT computation of the k-stage FFT 20 computation upon each set of  $2^s$  data points of the respective  $N/(2^{IS+S}M_1)$  sets of  $2^s$  data points results in  $N/(2^{IS+S}M_1)$  S-stage FFT computations, the CPU being further adapted to perform computations for S stages of an S-stage FFT computation of one of the  $N/(2^{IS+S}M_1)$  S-stage FFT computations before performing 25 another S-stage FFT computation of another one of the  $N/(2^{IS+S}M_1)$  S-stage FFT computations.

35. A processor according to claim 28 wherein the stage groupings of at least one stage comprise a stage grouping of D stage wherein D is an integer, said k substantially satisfying 30  $k=QS+D$  and said D substantially satisfying  $D \leq \log_2(M_1)$ .

36. A processor according to claim 29 wherein the stage groupings of at least one stage comprise a stage grouping of D stage wherein D is an integer, said k substantially satisfying  $k=QS+D$  and said D substantially satisfying  $D \leq \log_2(M_I/2)$ .

5 37. A processor according to claim 20 wherein the CPU is adapted to produce the k-stage FFT computation in real-time.

38. A processor according to claim 20 wherein the internal memory comprises a buffer adapted to store twiddle factors used in the FFT computations.

10 39. An article of manufacture comprising:

a computer usable medium having computer readable program code means embodied therein for performing a k-stage FFT computation of N data points wherein k and N are integers, the computer readable code means in said article of manufacture comprising:

computer readable code FFT means for performing, for each one of a plurality of stage groupings of at least one stage of k stages of the k-stage FFT computation wherein the plurality of stage groupings of at least one stage collectively comprise the k stages, a respective plurality of operations on the N data points, for each one of the respective plurality of operations the computer readable code FFT means comprising:

computer readable code means for providing instructions for performing an import of a respective plurality of sets of data points of the N data points from an external memory into an internal memory;

TOP SECRET//  
REF ID: A6512

computer readable code means for performing a FFT computation upon each one of the respective plurality of sets of data points of the N data points in the internal memory; and

computer readable code means for providing  
5 instructions for performing an export of the respective plurality of sets of data points of the N data points from the internal memory into the external memory to update the respective plurality of sets of data points of the N data points in the external memory.

10 40. An article of manufacture according to claim 39 wherein the stage groupings of at least one stage comprise a stage grouping of D stage and Q stage groupings of S stages wherein D, Q and S are integers and wherein said k substantially satisfies  $k=QS+D$  and  $k = \log_2(N)$ .

15 41. An article of manufacture according to claim 40 comprising further computer readable code means for:

grouping, within a stage grouping, I, of the Q stage groupings wherein I is an integer satisfying  $0 \leq I \leq Q-1$ , the N data points into  $2^{IS}$  blocks of data of  $N/2^{IS}$  respective data points;

grouping, within each one of the  $2^{IS}$  blocks of data, the  $N/2^{IS}$  respective data points into  $2^S$  sub-blocks of data of  $N/2^{IS+S}$  respective data points;

grouping, within each one of the  $2^S$  sub-blocks of data, the  $N/2^{IS+S}$  respective data points into  $M_1$  sub-sub-blocks of data of  $N/(2^{IS+S} M_1)$  respective data points wherein  $M_1$  is an integer;

5 taking, within each one of the  $2^{IS}$  blocks of data, a respective sub-sub-block of data of the  $M_1$  sub-sub-blocks of data from each one of the  $2^S$  sub-blocks of data to form a set of  $2^S$  sub-sub-blocks of data of  $N/(2^{IS}M_1)$  data points a total of  $M_1$  times to produce  $M_1$  sets of  $2^S$  sub-sub-blocks of data of  $N/(2^{IS}M_1)$  data points; and

10 taking, within each one of the  $M_1$  sets of  $2^S$  sub-sub-blocks of data, a respective data point from each one of the  $2^S$  sub-blocks of data to form a set of  $2^S$  data points a total of  $N/(2^{IS+S}M_1)$  times to produce  $N/(2^{IS+S}M_1)$  sets of  $2^S$  data points.

42. An article of manufacture according to claim 41 wherein, for the stage grouping, I, of the of the Q stage groupings of the plurality of stage groupings, the computer readable code FFT means is further adapted to perform  $M_1$  of the plurality operations for each one of the  $2^{IS}$  blocks of data within the stage grouping, I, and for each one of the  $M_1$  of the plurality operations the computer readable code FFT means further comprising:

20 computer readable code means for providing instructions for performing an import of respective  $N/(2^{IS+S}M_1)$  sets of  $2^S$  data points from the external memory into the internal memory;

25 computer readable code means for performing an S-stage FFT computation of the k-stage FFT computation upon each set of  $2^S$  data points of the respective  $N/(2^{IS+S}M_1)$  sets of  $2^S$  data points in the internal memory; and

computer readable code means for providing instructions for performing an export of the respective  $N/(2^{IS+S}M_1)$  sets of  $2^S$  data points from the internal memory into the external memory to update the respective  $N/(2^{IS+S}M_1)$  sets of 5  $2^S$  data points in the external memory.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100